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ABSTRACT 

We introduce to as trophysics the threshold probabihty functions 82,02, and D2 first derived by 
iTorguato et al.l ()1988l ). which effectively samples the fiux probability distribution (PDF) of the Lya 
forest at different spatial scales. These statistics are tested on mock Lya forest spectra based on 
various toy models for He II reionization, with homogeneous models with various temperature-density 
relations as well as models with temperature inhomogeneities. These mock samples have systematics 
and noise added to simulate the latest Sloan Digital Sky Survey Data Release 7 (SDSS DR7) data. 
We find that the flux PDF from SDSS DR7 can be used to constrain the temperature-density relation 
7 (where r ex (1 -I- A)'''"^) of the intergalactic medium (IGM) at z = 2.5 to a precision of A7 = 0.2 at 
~ 4(7 confidence. The flux PDF is degenerate to temperature inhomogeneities in the IGM arising from 
He II reionization, but we find ^2 can detect these inhomogeneities at ~ 3a, with the assumption that 
the flux continuum of the Lya forest can be determined to 9% accuracy, approximately the error from 
current fitting methods. If the fiux continuum can be determined to 3% accuracy, then 5*2 is capable 
of constraining the characteristic scale of temperature inhomogeneities, with ~ 4(t differentiation 
between toy models with hot bubble radii of 50 h^^ Mpc and 25 h^^ Mpc. 

Subject headings: cosmology: theory — intergalactic medium — quasars: absorption lines — methods: 
data analysis — large-scale structure of universe 



1. INTRODUCTION 

The absorption of radiation by the Lyman-a (Lya) 
resonance of neutral hydrogen along the line-of-sight 
to high-redshift quasars, commonly known as the Lya 
forest, is an import ant probe of large- s cale structure at 
z > 2 (see, e.g. iCroft et a.Ll 119981: iMcDonald et all 
200a [Croft et alJ |2002[ iZaldarriaga et al.l 120031 



McDonald et al.l 120051 ). The utility of the Lya forest as 



a cosmological tool has been enabled by theor etical work 
on th e intergalactic medium (IGM) ( see, e.g..lCen et al.l 
1991 iMiralda-Escude et all Il996l: ICroft et al.l " ITgga 



Davi et al.|[l999HTheuns et al.lll998l ). which allowed the 
underlying dark matter field to be mapped from the 
Lya absorption. 

In recent years, increasing attention has been turned 
towards obtaining a deeper understanding on the de- 
tailed astrophysics of the IGM such as the ionizing ul- 
traviolet background, temperature field, metals, etc. The 
reionizations of H I and He II have been shown to play 
critical roles in regulating the properties of the IGM. 

The reionization of He II at 2 ^^ 3 should leave an ob- 
servable imprint on the properties of the Lya forest. Re- 
cent theoretical developments have recognized that He II 
reionization must have occurred in an extended and inho- 
mogeneous fash ion as the process was driven by rare and 
bright quasars 



iMcQuinn et al.l 



Lai et al.l 12001 iFurlanetto fc Ohll2008bl: 



20091 ). These spatial variations in He II 



reionization history should modulate the equation of 
state and entropy of the IGM. 

Observations of the Lya forest have the potential to 
reveal details of this reionization process. The sources of 
E > 54.4eV photons which drive He II reionization are 

|lee@astro. Princeton. edu| 
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believed to be rare bright quasars, but different model 
assumptions on, e.g., quasar duty cycles and luminos- 
ity functions can dramatically c hange the history and 
morphology of He II reionization (jMcQuinn et al.|[20 091 , 
as well as the th ermal p roperties of the resulting IGM 
(jFurlanetto fc Ohl[2008al) . 

Observational constraints on He II reionization 
have arguab l y lag ged behind the theoretical work. 
iSchave et al.l (|2000[ ) measured a peak at z « 3 in the 
temperature evolution of the IGM from the Doppler 
parameters of high-resolution Lya forest spectra, that 
they claimed to be due to He II reionization. While 
some authors JT hcuns et al. 2002; Bernardi ct al. 2003; 
iFaucher-Giguere et al.ll20 08) argue that the detection of 
the feature at z ~ 3.2 in the evolution of the effective Lya 
optical depth. Toff, of the IGM p rovides further e v idence 
for the reionization transition, iDall'Aglio et al.l ()2009l ) 
failed to find this transition. More recently, the Cosmic 
Origins Spectrograph on the Hubble Space Telescope is 
beginning to shed light on He II reionization through 
studies of the He II Lya forest and associated H e II 
Gunn-Peterson troughs (see, e.g., iShull et aLll2010[ ). 

Measure ments of the flux probabilit y distribution func- 
tion (PDF iJenkins fc Ostriken 119911) of high-res o lution 

Lyg fores t spe ctra (e.g., McDonald et al.l 120001 : 

iLidz et al.ll2006t iKim et al.l l 2007[ ) have constrained the 
equation of state 7 of the IGM. In this paper, we define 
the equation of state of the gas as a function only of the 
gas density: 



r(A) 



(1) 



where T{A) is the temperature as a function of density, T 
is the temperature at mean density, and A = p/{p) is the 
density contrast of the gas. Different scenarios for He II 
reionization make distinctive predictions for the red- 
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shift evolution o f 7. Several authors (jBecker et al.ll2007l : 
iViel et all |2009[ ) claimed to have detected an inverted 
equation of state 7 < 1, which is has been theorized to 
arise fr o m the late reioniza tion of voids (jFurlanetto fc OhI 
I2008al) . ILidz et all (j200iy ) attempted to measure spatial 
inhomogeneities in the thermal state of the IGM by us- 
ing a wavelet filter on high-resolution spectra, but had a 
null detection. 

Most observational attempts to place constraints on 
the IGM have been based on relatively small-numbers 
of high-resolution {R = A/AA ^ 10"*) and high signal- 
to-noise (S/N ^ 10^ per pixel) spectra. However, the 
largest single source of data on the Lyg forest is ar - 
guably the Sloan Digital Sky SurvejQ (jYork et al.ll20M ). 
which includes ^10* quasars with useable Lya forest, 
albeit of moderate quality {R w 2000, S/N - 4). In the 
near future, the Baryon Oscillation Spectroscopic Survey 
(BOSS, part of SDSS-IIJj) aims to increase the sample 
size of high-redshift (z > 2) quasars to > 10*^ at a similar 
spectral resolution to the SDSS. 

Many of the techniques (Voight profile fitting, wavelet 
analysis etc.) developed for probing the IGM are un- 
suitable for use with the lower quality of the SDSS data, 
while the flux statistics that have been measured for the 
SDSS Lya forest, such as the flux power spectrum, are 
relatively insensiti ve to the detaile d astrophysics of the 
IGM. For example. iLai et all (|2006( ) have shown that the 
SDSS flux power spectrum is insensitive to large-scale 
temperature inhomogeneities arising from He II reioniza- 
tion. 

In this paper, we borrow some statistics used to 
measure mechanical and transport properties of disor- 
dered media in the material sciences, which we term the 
'threshold probability functions', and explore their appli- 
cations to mock Lya forest spectra based on the SDSS 
sample. We will generate simple toy models for the IGM 
based on different scenarios for He II reionization, and 
explore the ability of the threshold probability functions 
to distinguish between them. 

We first introduce and define these statistics in fJ21 be- 
fore digressing to discuss the simulations and toy models 
we use to generate the mock spectra in ij3] In S|4] we 
calculate the fiux PDF as a check on our errors, before 
going on to apply the threshold statistics on the mock 
data in !j5l 

2. DEFINITION OF THRESHOLD STATISTICS; 
52, C2, AND D2 

In the study of the Lya forest, the two-point flux cor- 
relation function is commonly defined as: 



ar) = {Spir')6Ur'+r)}, 



(2) 



where r is the comoving distance between two points in 
the line-of-sight of the background quasar (equivalently 
expressed as redshift, wavelength, or velocity intervals 
within the observed spectrum), and 



6 Fir) 



F 



1 



(3) 



where F{r) = e '^'^''^ is the flux transmitted through the 
Lya optical depth T{r) at a given point, and F is the 
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Fig. 1. — A segment of transmitted Lya forest flux plotted as a 
function of comoving distance r along the line-of-sight, illustrating 
the definition of the threshold probability functions C2 and D2 ■ All 
pixels above the flux threshold (-Ft^ = 0.75 in this example) con- 
tribute to C2 , D2 , and 52 = (^2 -I- D2 ■ The arrows at left illustrate 
a pixel pair within the same 'cluster' of pixels which contribute to 
C2, while the arrows at right denote another pixel pair in different 
clusters which contribute to D2- 

mean transmitted flux in the spectrum. Another com- 
monly used statistic is the Fourier transform of S,f{t), 
the flux power spectrum 



PF{k) 



Ar/2 
'Ar/2 



^F{r')e'^''' dr' 



(4) 



computed over the interval Ar, and k is the wavenumber. 
In this paper we introduce to astrophysics several 
related two-point statistics used in material science 
(|Torquato et all 119881 : iJiao et al.l[200l . For a volume 
that is occupied by a two-phase medium, for any one of 
the phases (say phase i) we can define 



5'2(ri,rl) = C2(rl,r^) + £'2(?'l,rl). 



(5) 
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5*2 is the probability function of finding both points rl 
and r2 in phase z, while C2 and D2 denote the proba- 
bility of finding points of the phase i within the same 
'cluster' (in this paper referring to contiguous groupings 
of pixels, not galaxy or star clusters), and in separate 

clusters, respectively. This is illustrated in Figure [TJ C2, 
known as the 'cluster function', en codes higher-ord er not 
contained in two-point statistics piao et al.ll2009D . For 
astrophysical purposes we assume an isotropic and ho- 
mogeneous field, so 5'2(rl,r5) = 5*2(1^1 - r^l) = S2{r) 
etc. 

We generalize these statistics for use with the Lya for- 
est by defining them as a function of flux threshold, Fth- 
At each value of Ftin we divide the spectrum into high 
[F > Fth) and low [F < Fth) regions and then compute 
the clustering properties for these phases. In this paper 
we focus on the high (F > Fth) phases which are more 
sensitive to He II reionization. This defines two dimen- 
sional functions, S2{r,Fth), C2{r,Fth), and D2{r,Fth)- 
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The threshold probability function at zero lag, 

S2iO\Fth) is directly related to the familiar flux proba- 
bility distribution function (PDF), p{F): 



S2{0\Fth) = / p{F)dF. 



(6) 



Fth 



The number of possible pixel pairs (and hence 6*2) de- 
creases linearly with r within a finite length L. In order 
to take this effect into account we introduce a rescaled 
version of S2 '■ 



(7) 



S2ir\Fth) = S2ir\Fth) {I - J- 



and similarly for C2 and D2. At large separations, 
S2{r\Fth) — >■ S'|(0|i^tft) if there is no large-scale order 
in the system. We refer to S2{r\Fth), C2{r\Fth), and 
F>2{r\Fth) collectively as the 'threshold probability func- 
tions'. Intuitively, they can be thought of as the flux 
PDF evaluated as a function of correlation length. 

These threshold probability functions should not be 
confused with the threshold crossing statistics that 
counts the number of times that the observed Lya for- 
est transmission spectrum intersects a given flux value 
per u nit redshift (jMiralda-Escude et al.lll996t iFan et all 
l2002f ). The threshold crossing statistic is a generalization 
of the forest line density and does not contain spatial in- 
formation, unlike the threshold probability functions. 

While He II reionization may create a two- 
phase thermal struc ture in the IGM (jLai et al.l 120061 : 
iMcQuinn et al.l [20091 ) . its imprint on the Lya forest will 
not be manifested as distinct phases in the flux distribu- 
tion of the Lya forest spectra. It is the density field which 
predominantly determines the distribution of low- and 
high- absorption regions. These thermal inhomogeneities, 
however, do modulate the amplitude of the optical depth 
and leave a potentially statistically detectable effect es- 
pecially in the low-absorption regions. It is this effect 
which we are attempting to detect using the threshold 
clustering functions. 

3. Lya FOREST MODEL 

In this section, we describe a series of simulations of the 
Lya forest that we will use to test ability of the thresh- 
old probability functions to distinguish between different 
He II reionization histories. 

3.1. Simulations 

We us e a set of publicly availabl43 mock Lya forest 
spectra (jSlosar et al.l 120091 ) that have been generated 
from dark matter-only particle-mesh simulations based 
on a flat ACDM cosmology with Ha — 0.75, f^A/ = 
0.25, h = 0.75, n — 0.97, and as — 0.8. The simulations 
evolved 3000'^ particles in a 1500ft.~^Mpc box with the 
forces computed on a 3000^ grid. Density and velocity 
fields were then generated using spline-kernel interpola- 
tion with an effective smoothing radius of 250/i^^kpc, 
and line-of-sight skewers extracted at redshift z = 2.5 
with a spacing of 10/i~^Mpc to provide 150^ — 22500 
skewers from each run. 

^ |http : //mwhlte . berkeley ■ edu/BOSS/LyA/Franklln/ 1 



The Lya optical depth in each pixel was then gener- 
ated usin g the fluctuating Gunn-Peterson approx imation 
(FGPA, ICroft et allHOOllGnedin fc Huilll998aD : 



T OC T-0-7a2-0-7(7-1) 1 



1 dv> 



pec 



H{z) dx 



(8) 



where A = p/ p is the density perturbation, H{z) is the 
Hubble parameter at redshift z, dvpec/dx is the peculiar 
velocity gradient, T is the temperature at mean density, 
and 7 parametrizes the equation of state between den- 
sity and temperature (see Equation [ij . In these runs 
T = 2 X W^K and 7 — 1.5 was assumed. In addition 
to the optical depth skewers, matching skewers of the 
underlying density were also made available. 

These dark matter simulations are not expected to ac- 
curately capture the small-scale power of the Lya forest 
as hydrodynamics are not included. However, they pro- 
vide a fiducial set of mock spectra from which it is easy to 
generate different toy models of the IGM by adjusting the 
optical depths using the FGPA (Eq. [S]). This provides a 
convenient testing ground for the ability of the threshold 
probability functions to distinguish between differences 
arising from different IGM models. 

3.2. Homogeneous IGM Models 

The fiducial set of simulated spectra described above 
assumes a homogeneouci IGM in which T = 2 x W^K 
and 7 = 1.5. This is the value of 7 at which un- 
shocked gas settles at z ^ 3 after a hydro gen reioiiiza- 
tion event at 2 > 6 (|Hui fc Gnedinlll997D . while T = 
2 X W^K is approximately the value obtained through 
line-profile fitting of h i gh-resolution Lya f orest spectra 
(jMcDonald et al.lhoOltlTheuns et al.ll2002D . We refer to 
this fiducial model as 'G1.5'. 

We study the effects of changing the equation of state 
uniformly across the IGM by creating models with 7 
set to 1.3 and 0.8 (models G1.3 and GO. 8, respectively). 
7 — 0.8 represents an inverted equation of state predicted 
by Furlanctto & Oh (2008a) for scenarios in which dense 
regions, which were reionized early on, have had time 
to cool adiabatically to temperatures lower than more 
recently reionized voids. We introduce G1.3 as an in- 
termediate case between G1.5 and GO. 8, although as we 
shall see later, it has a flux PDF which is degenerate with 
our inhomogeneous reionization models. 

The mean temperature T is kept unchanged, as global 
variations in temperature will be manifested as changes 
in the mean flux leveO, which we regard as a flxed param- 
eter by normalizin g our Lya forest spectra to the same 
value of (F) = 0.8 (JMeiksin fc Whitdbool . 

3.3. Inhom,ogeneous IGM Models 

Several authors (jLai et al.ll200l IMcQuinn et~aIll2009H 
have suggested that the reionization of He II by quasars 
at z ~ 3 results an inhomogeneous IGM with signifl- 
cantly heated regions. In this paper we consider a basic 

■^ Note that in this paper 'homogeneity' and 'inhomogeneity' 
refer to the spatial distribution of the IGM thermal properties, 
primarily 7 and/or T. 

^ This is not entirely true as thermal broadening would change 
the small-scale power of the Lyo forest, but due to the low- 
resolution of our simulations and the fact that we are looking for 
large-scale variations, we ignore this effect. 
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Fig. 2. — Examples of pixel masks used to generate the inliomogeneous IGM models, shown here as 2-dimensional cross-sections across 
the box which is 1500 h~^ Mpc on a side. From left to right: Models R50, R25, and 150. White represents the hot IGM phase while black 
is the cold phase with both occupying 50% of the volume. The mock spectra will thus represent 1-dimensional samplings through the hot 
and cold phases shown here. 



picture of inhomogeneous He II reion ization inspired by 
the simulation results of Mc Quinn et al. (2009), in which 
the IGM is composed of hot and cold phases with equal 
volume-filling factors of 50% each. The hot phase has a 
mean temperature Th = 2.5 x W^K and 7 = 1.2, while 
the cold phase has a mean temperature Tc = 1.5 x lO^i^ 
and 7 = 1.5. Note that while changes in the overall mean 
temperature will be absorbed in the mean flux level, the 
contrast between T^ and Tc is more important as it re- 
sults in relative changes of optical depth independent of 
the overall flux normalization. 

We approximate the spatial distribution of the hot and 
cold regions by generating a pixel mask in which bubbles 
of radius Rtub are randomly inserted into a 3-dimensional 
box with a comoving volume (1500 h~^ Mpc)"^, the same 
size as the simulation box. These bubbles, which are al- 
lowed to overlap, are added until 50% of the volume lies 
within bubbles. This allows a full distribution of intersec- 
tion path lengths within the mock spectra, from close to 
zero to several times Rbub as illustrated in Figure [2] Our 
simple model provides a si mulacrum of the comp lex spa- 
tial morphologies seen in iMcQuinn et al.l ()2009() . Note 
that we neglect peculiar velocities as the bubble positions 
are completely random and have no overall correlations 
with large scale-structure. 

The mock spectra from the simulations are then mod- 
ified on a pixel-by-pixel basis depending on whether 
the three-dimensional spatial position of each pixel was 
flagged by the pixel mask: the optical depths of pix- 
els which fall within hot bubbles are rescaled usijig the 
FGPA (Equation [S]) so that they have T = Tu and 
7 = 1.2, while pixels outside the bubbles have T = Tc and 
7 = 1.5. Note that this does not take into account the 
effect of thermal broadening. Although thermal broaden- 
ing primarily affects the small-scale power of the forest, 
it also has a small effect on the large-scale bias. This 
coupling between the large- and small-scale power is not 
well-understood and will need to be accounted for in a 
full data analysis using large-scale hydrodynamical simu- 
lations, but for the approximate treatment in this paper 
we ignore this effect. 

The characteristic scale of the thermal inhomogeneities 



arising from He II reionization are expected to be depen- 
dent on quasar physics such as duty cycles, clustering 
and ionizing spectrum (jMcQuinn et al.l 120091) . We thus 
test the threshold probability functions to this charac- 
teristic scale by introducing models with hot bubbles of 
Rbub — 50/i~-^Mpc and Rbub — 25/i~^Mpc, denoted by 
R50 and R25 respectively. Both these models have the 
same volume filling fraction of 50% for hot bubbles. 

In addition, we created a model, 150, in which the 
topology of He II reionization is inverted, i.e. the IGM is 
composed of cold bubbles of Rbub = 50 h~^ Mpc with the 
surrounding regions filled with the hot phase, with the 
properties of the hot and cold phases identical to those of 
models R50 and R25. Although this model is not physi- 
cally realistic, we include it to test the sensitivity of the 
threshold statistics towards topology. Figure[2]illustrates 
the inhomogeneities in our various models. 

All the inhomogeneous models, R50, R25, and 150, 
have the same total number of pixels intersecting with 
hot regions when averaged across large numbers of spec- 
tra. The differences lie in the manner the pixels are spa- 
tially distributed along the skewers. 

3.4. Mock Spectra 

As the primary purpose of this paper is to study the 
ability of the threshold statistics to constrain the IGM 
using existing SDSS data, in this paper we model the 
instrumental and systematic effects at an approximate 
level sufficient to estimate the errors. A more detailed 
approach is to be deferred to a later data analysis paper. 

First, flux transmission spectra are generated from 
the optical depth skewers from F = exp(— r), with a 
global mean flux set to (F) = 0.8. The spectra are then 
Gaussian-smoothed to the resolution of the SDSS spec- 
tra which have FWHM=150km s~^ (corresponding to 
~ 1.4/1"-'^ Mpc at z = 2.5) and binned into 70kms~^ 
pixels. We then split the 1500 h~^ Mpc simulation skew- 
ers into segments of length 500 h~^ Mpc, approximately 
the comoving distance subtended by individual Lya for- 
est lines-of-sight at z — 2.5. 

In the upper panel of Figure |3] we show the flux trans- 
mission for a segment of noiseless mock Lya forest spec- 
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Fig. 3. — Top panel: Flux transmission spectra from models G1.5 (thin black line), GO. 8 (green dashed line), and R50 (thick red line) for 
one line-of-sight - shown here without noise or other systematics. Bottom panel; Difference in flux between models GO. 8 and G1.5 (green 
dashed line), and between R50 and Gf .5 (solid red line), normalized by the mean flux. The horizontal bars indicate the portions of the 
line-of-sight which intersect with hot bubbles. See electronic edition of the Journal for a color version of this figure. 



truni in the fiducial simulations (model G1.5), along 
with corresponding spectra for models GO. 8 and R50. 
It is difficult to distinguish between the different cases 
by eye in Figure |31 although GO. 8 shows more contrast 
between the high- and low-absorption regions, which is 
due to the stronger scaling of optical depth with density 
(r ex A^~°-^*^'''~^)) when 7 is decreased. 

The lower panel shows the difference in flux relative to 
model G1.5, divided by the mean flux. For model GO. 8 
the increased absorption occurs within the Lya lines 
where the optical depth is high. The flux in R50 shows a 
slight overall increase (~ 5%) within regions which inter- 
sect hot bubbles, relative to the cold regions. While the 
increased transmission from the bubbles can be picked 
out by eye from the difference spectra in Figure |31 a pri- 
ori it would be impossible to identify these regions from 
a given spectrum. 

As the SDSS spectra are relatively noisy, the mock 
samples need to have approximately the correct noise and 
sample properties. We estimated the signal-to-noise ratio 
per pixel (S/N) in the Lya forest of the SDSS data in 
following manner: fro m the latest SPSS Da ta Release 7 
(DR7) quasar catalog ( Schneider et al.l[2010( l we selected 
quasars in the redshift range 2.4 < Zgso < 2.7. In the 
wavelength region (l-|-z^5o)1040l < A < (1 + Zgso)1180l 
(the typical range of usable Lya forest), the S/N for each 
sightlinc is then estimated as the median of /A,i/<7Af,i 
(where fx^i and <7N,i are the observed flux and pipeline 
noise from the individual pixels). While the absorption 
of metals such as C IV and O VI are not included in this 
simplified model, a full analysis of actual SDSS spectra 
should use metal absorption measurements from high- 
resolution data to estimate this systematic. 

Within the aforementioned redshift range there are 
3217 quasars in DR7, of which 1641 have S/N > 4 in 
the Lya forest. We thus set our mock sample size to be 
1500 spectra with S/N = 4 per pixel, in which Gaussian 
noise with variance aj^ = (i^/S/N)^ was added to each 



pixel in the mock spectra to simulate the instrumental 
noise, where F is the mean flux within each skewer. 

While the instrumental modelling is carried out at a 
relatively simple level in this paper, the actual instru- 
ment al systematics of the SDSS data are well under stood 
(e.g. iStoughton et al.l[200l iMcDonald et all 120061 ) and 
should be included in any analysis of real data. The 
biggest known unknown is the ability to accurately to fit 
the quasar continuum. 

4. QUASAR CONTINUUM FITTING AND FLUX PDF 

Since the threshold probability functions are evalu- 
ated as a function of the transmitted fiux in the Lya 
forest, they are sensitive to the fitting of the intrinsic 
quasar continuum. The fitted continuum fixes the zero- 
absorption level and determines the normalization of the 
flux transmission in the Lya forest. The uncertainties in 
continuum fltting are likely the dominant source of sys- 
tematic error and are not as well understood as the SDSS 
instrumental noise. 

In this section we discuss the continuum-fitting meth- 
ods that have been published in the literature as well as 
their associated errors, and compute the flux PDF as a 
check on our assumed errors. We will see that the flux 
PDF from the SDSS data in itself can put interesting 
constraints on the IGM. 

4.1. Continuum Fitting 

iDesjacques et al.l (|2007l ) carried out a study of the sys- 
tematics involved in measuring the Lya forest flux PDF 
from the SDSS (albeit from the earlier Data Release 3 
(DR3)), by comparing it with mock spectra generated 
using parameters derived from high-resolution Lya for- 
est spectra. They found that the discrepancies between 
their mock spectra and the measured SDSS PDF can 
largely be accounted for by errors in the quasar contin- 
uum fits. Individual forest spectra were with a power law 
and Gaussian curves for the quasar emission lines (e.g. 
at 1070 A and 1120A in the quasar restframe), a tech- 
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nique first introduced by iBernardi et al.l (|2003D for use 
o n composite quasa r spec tra. 

iDesiacaues et alj ()2007D estimated an error of ap ~ 
20% in their determination of the individual quasar con- 
tinua, although this was derived from the error budget 
of the measured flux PDF rather than from a detailed 
a nalysis of indi v idual fitted continua. 

iSuzuki et alJ ()2005[ ) discussed quasar continuum fit- 
ting using principal component analysis (PCA) based on 
~ 50 ultraviolet quasar spectra obtained from the Hubble 
Space Telescope. At the low-redshifts {z ^ 0.5) of their 
data set there is little Lya forest absorption, which al- 
lows the quasar continuum to be accurately measured in 
wavelength regions which normally suffer from consider- 
able Lya forest absorption. They reported a typical error 
of ap ~ 9% in estimating the Lya forest continuum us- 
ing only points red- wards of the Lya emission line which 
would be unaffected by the Lya forest. In addition, they 
found that the PCA gave good fits for the shape of the 
quasar continuum at the Lya forest wavelengths, even 
when the amplitude was not accurately predicted from 
the red-side of the spectrum. 

While the PCA fitting method on low S/N spectra 
needs to be extensively tested, we expect it to be more 
accurate than the power-law fits as it would in prin- 
ciple account much of the large-scale structure in the 
intrinsic quasar continuum arising from weak emission 
lines, which could be degenerate with large-scale inho- 
mogeneities in the IGM. For th e purposes of our m ock 
spectra we take the findings of ISuzuki et al.l ()2005l ) at 
face-value, and adopt a model for the continuum fitting 
errors in which the shape of the continuum is assumed 
to be perfectly predicted, leaving only a constant nor- 
malization error with no tilt or wiggles in the residual. 
The errors in the continuum level are then obtained by 
normalizing each individual mock spectrum by a local 
mean flux F drawn from a Gaussi an distribution wit h 
a global mean fiux (F) = 0.8 (Mci ksin fc White! I2004D . 
and standard deviation ap — 9%. 

4.2. Flux PDF 

In order to provi de a comparison f or ou r assumed 
systematics vis-a-vis IDesiacaues et al.l ()2007f ) , we eval- 
uate the PDF for our toy models computed from mock 
samples similar to the DR3 data, i.e. 600 spectra with 
S/N = 4 per pixel at z ~ 2.5. 

The resuhant PDFs for models G1.5, G0.8, and R50, 
are shown in the top panel of Figure [4^, with each PDF 
divided into bins of width AF = 0.05. The error bars are 
the 1 (7 dispersion from ~ 100 realizations. These real- 
izations have different random seeds for the instrumental 
noise, and cosmic variance is included by drawing each 
realization from different lines-of-sight in the simulation 
box and generating the bubble distributions separately 
in the case of the inhomogeneous models. The errors are 
plotted explicitly in Figure |3}d. 

The PDFs from our mock spectra have the general 
characteristics expected from noisy Lya forest spectra, 
e.g. the existence of pixels with _F < and F > 1. 
The fact that the plots in Figure [4^ are clearly dif- 
ferent from the P DFs of high-resolution spectra e.g. in 
iKim et al.l (I2007D underhnes the fact that the flux PDF 
is sensitive to noise and other systematics. We do not 
expect our PDF to match the observed SDSS PDFs in 



IDesiacaues et al.l ()2007D exactly as our mock spectra are 
generated from simulations that do not have the right 
small-scale power, nor did we carry out a detailed con- 
sideration of the systematics such as instrumental noise. 
Such a careful approach would be required when analyz- 
ing real data in order to constrain a priori the overall 
Lya forest parameters, but in this paper we are con- 
cerned with the differences arising from the IGM with 
respect to a fiducial model, so the fiducial model itself 
does not need to be exactly right for our tests. 

We use the errors on the PDFs as a check on our 
assumptions regarding the systematics. As expected, 
increasing the scatter ap of the continuum levels in- 
creases the error bars on the PDF. We find that the 
err ors from our m o ck PD Fs, Figure \^, and those 
of iDesiacques et al.l ()2007l ) (Figure 7 in their paper) 
are similar when w e use ap = 9% as suggested by 
ISuzuki et al.l (|2005| ). This indicates that our simplified 
models for the SDSS instrumental effects and sample 
properties provide a reasonable estimate for the sample 
errors. For our DR7 mock samples in the later parts of 
this paper we will henceforth assume that the shape of 
the quasar continuum can be fit perfectly, and set errors 
in the continuum level to ap — 9%. 

The lower panel of Figure 0^ shows the differences 
in PDF with respect to the fiducial model G1.5, 
Ap(F)modoi = p(^)modoi " p(f )gi.5- Interestingly, it 
is clear that even with the DR3 data set it would have 
been possible to distinguish an inverted equation of state, 
GO. 8, from the fiducial model G1.5 at high significance. 

In order to quantify the ability of the PDF to distin- 
guish between the different models, we compute a loga- 
rithmic likelihood 
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^i^fC~Mx,-^^,), (9) 



where we assume the PDF for one model, Xi - the sub- 
script refers to the bins in the data -, to be the 'observed' 
data and take the mean PDF for another model ^i as the 
'theory' points, and C^ is the covariance matrix for xt (in 
this case directly evaluated from the mock realizations). 
A small value of — ln£ '^ 1 indicates that the 'theory' 
model is consistent with the 'observed' model and hence 
the two models cannot be differentiated. As the esti- 
mated errors are similar for all the models as shown in 
Figure Hb, in practice xi and /i^ are interchangeable for 
any two models. When evaluating EquationlHlon the flux 
PDF we use only bins in the range 0.4 < F < 1.1, which 
excludes the lower and upper 10% of the flux distribu- 
tion. 

Table [1] summarizes the results from assuming a DR7 
sample size (1500 quasars) and the systematics we have 
assumed. Each entry in the table shows —InC for dis- 
tinguishing between PDF of the 'observed' model in the 
corresponding column, and the PDF of 'theory' model in 
the corresponding row. 

We find that the PDF can distinguish G0.8 from G1.5 
with a high significance of - ln/:(G0.8|G1.5) = 388.5. 
At first glance, we get the surprising result that the 
PDF can differentiate the inhomogeneous He II reion- 
ization model R50 from the fiducial G1.5 at a signifi- 
cant — ln>C(R50|G'1.5) — 15.4. However, note that the 
models R50 and G1.3 are approximately degenerate with 
— ln>C(R50|G1.3) = 2.3 between the two models. This is 
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Fig. 4. — (a) Top panel shows the flux PDF from 600 mock spectra with S/N = 4 per pixel at z = 2.5, generated for models G1.5 (black 
diamonds), GO. 8 (green triangles), and R50 (red squares). The points are offset horizontally for clarity. Lower panel shows the difference 
in the PDF between models GO. 8 and G1.5 (green triangles), and between models R50 and G1.5 (red squares), (b) Estimated errors on the 
flux PDF from models G1.5 (black solid line), GO. 8 (green dotted line), and R50 (red dashed line). See electronic edition of the Journal 
for a color version of this figure. 
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Note. — Assumes mock SDSS DR7 data set of 
1500 quasars at z = 2.5, with S/N = 4 in the Lya 
forest. 



because the PDF of R50 is essentiaUy averaged across its 
two different phases of 7 = 1.2 and 7 = 1.5, thus it can 
be fit by a homogeneous model with some intermediate 
value of 7. 

As expected, we find that the PDF is degenerate be- 
tween the models R50, R25, and 150 since the only dif- 
ference is the spatial distribution of the line-of-sight seg- 
ments which intersect with hot bubbles. 

There are several uncertainties in the IGM parame- 
ters which might be degenerate with 7 in the flux PDF. 
The Jean's smoothing scale of the IGM is dependent on 
physics such as complex hydrodynamic effects and the 
temperature evo lution of the gas, wh ich are not very 
well understood. iGnedin fc Huil (|1998bl ) have argued for 
an effective smoothing scale which is approximately half 
the Jean's scale at the epoch of observatio n. This gives 
CTp.ff » 0.15ft.~^Mpc at z = 2. As the iSlosar et alJ 
(|2009f ) simulations used in this paper do not accu- 
rately capture small-scale power ((Te// = 0.25/i~^Mpc 
was used), we tur n to another set of simulations: the 
iWhite et a l. (2010) 'Roadrunner' simulations are similar 
to those of Sl osar et al.i (|2009l ) except with higher res- 
olution (0.1875 /i"^Mpc grid size vs 0.5/i"^Mpc) and 



<7eff — 0.1 /i^^ Mpc. To approximate the uncertain pres- 
sure smoothing scale on the flux PDF, we take the optical 
depth outputs of these simulations and smooth them to 
an overall smoothing scale of (Te// — 0.15 h^^ Mpc using 
a Gaussian kernel. In comparison with the original spec- 
tra, we find that the resulting flux PDFs differ by only 
-\nC{aeff = 0.15/i~iMpc|cre/7 = 0.1ft.~^Mpc) w 2, 
which is signiflcantly less than the difference caused by 
varying 7. In any case, in the future data analysis we ex- 
pect to use hydrodynamic simulations which would ob- 
viate the need to explicitly assume a value for the Jean's 
smoothing scale. 

Another systematic which could be degenerate with 7 
are the uncertainties in the value of the mean flux of the 
Lya forest (F), which we have thus far assumed to be 
fi xed. Using the somewh at smaller SDSS Data Release 
5, iDairAglio et al.l (|2009[ ) reported errors in their mea- 
surements of Teff = — ln(F) equivalent to ap ~ 0.3% 
in the mean flux at z « 2.5. We study the effect of 
this uncertainty by assuming the actual mean flux is dis- 
tributed as a Gaussian distribution with crp = 0.3%, 
and marginalizing over this when calculating the like- 
lihoods between the different values of 7. This gives 
a value of — ln£(Gl. 3101.5) w 15 in comparison with 
-~\nC{G1.3\G1.5,6{F) = 0) « 20 reported in Tabled 
This is a signiflcant effect and needs to be taken into 
account in a more formal data analysis, but does not 
qualitatively affect our conclusions. 

In general, we have found that the flux PDF from the 
latest SDSS data can be used to constrain the equation 
of state of the IGM if a homogeneous IGM is assumed. 
Indeed, the evolution of 7 with redshift can potentially 
be measured. In the SDSS DR7 quasar catalog there are 
more than 700 quasars with S/N > 4 in the Lya forest 
within redshift bins of Az — 0.3 up to Zqso ~ 3.5. The 
logarithmic likelihood —InC is roughly proportional to 
sample size, thus in comparison with our mock sample 
size of 1500 quasars at 2.4 < Zqso < 2.7 and the results 



smaller box {{750 h ^Mpcf vs {1500 h iMpc)^). The of Table [H we expect to be able to measure the IGM 

smoothing scale used in the Roadrunner simulations is 
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equation of state in these redshift bins to a precision of 
A7 ^ 0.2 with a confidence of — ln£ ~ 10 (approxi- 
mately 4cr), although it becomes more difficult to esti- 
mate accurately the quasar continuum in the Lya forest 
at higher redshifts. 

These simulations suggest that an analysis on the flux 
PDF from the SDSS sample could detect the predicted 
sup pression in the equation of state from 7 » 1.5 to 7 < 1 
(see iFurlanetto fc Oh|[2008bl: iMcQuinn et al.ll2009D due 
to He II reionization at z ~ 3. Note that this approach is 
complementary to studies which used the evolution of the 
mean optical depth Tog in the Lya forest to study He II 
reionization, since the PDF can in principle be used to 
measure 7 at fixed (F) — exp(— Tcff ). 

5. THRESHOLD PROBABILITY FUNCTIONS FROM MOCK 
SPECTRA 

In this section, we evaluate the threshold correlation 
statistics 6*2, C2, and D2 on the mock Lya forest sam- 
ple described in the previous sections. We first describe 
the form of these functions, before applying them to the 
various IGM models and investigate their ability to dis- 
tinguish between models. 

5.1. Basic Form of S2, C2, and D2 on the Lya Forest 

As we are primarily interested in breaking the de- 
generacy between the equation of state of the IGM 
and thermal inhomogeneities with comoving scales of 
^ 10 h^^ Mpc, we first smooth the mock Lya forest spec- 
tra with a Gaussian window of width a — 10/i~^Mpc. 
This has the effect of increasing the contrast from the 
temperature inhomogeneities and smoothing over the 
noise, although after smoothing the shapes of the spectra 
are still dominated by instrumental noise and the large- 
scale structure of matter. In other words, when visually 
inspecting the smoothed spectra of the same line-of-sight 
modified to the different IGM models, it is still impossi- 
ble to tell a priori which is the inhomogeneous model. 

For each value of Ftt , we first identify the pixels which 
have F > Ftu within each individual smoothed spec- 
trum, keeping track of the 'clusters' of adjoined pixels. 
S2{'r,Fth) is then calculated by counting pairs of pixels 
above Fth, and separated by correlation length r. This 
is then normalized to give a probability. We also keep 
track of C2{r, Ftt) as the probability of pixel pairs which 
are within the same cluster of pixels with F > Ftu- The 
threshold probability functions are then evaluated on a 
2-dimensional grid in Ftu and r on the SDSS DR7 mock 
samples. 

Figure [SK plots S2{Fth) and C2{Fth) at fixed r for the 
fiducial model G1.5 (recall that 5*2 = C2 -I- -D2, Equa- 
tion [5]). At large Fth few pixels in the smoothed spectra 
have sufficiently large fiux F to rise above the threshold, 
thus S2{Fth\T) tends to zero at all r as Ftu — >■ 1. Con- 
versely, as the threshold is lowered below the mean fiux 
{F), increasing numbers of pixels satisfy the criterion 
F > Fth and 5*2 trends towards unity. 



\im_S2{Fth\r) 

Fth<F 



1. 



(10) 



At small pixel separations r, the main contribution to 
5*2 comes from C2 as most pixel pairings that rise above 
the flux threshold are within the same cluster of pixels 
(recall that 'clusters' in this context refers to groupings 



of contiguous pixels and not galaxy or stellar clusters). 
Note that S2{Fth\r = 0) = C'2iFth\r = 0) is just the inte- 
gral of the flux PDF (see Equation |6]) . The contribution 
of C2 to 5*2 decreases and gives way to 02 = 82 — C2 at 
greater r as there is greater probability of flnding pixels 
from different clusters than within the same cluster. 

Figure \E}p plots S'2 and C2 as a function of r at fixed 
values of Fth- This shows more clearly the trend of C2 
dominating the probability of pixels pairs being above 
the flux threshold at small separations. The overall prob- 
ability S'2 increases as the flux threshold is lowered and 
more pixels satisfy the condition F > Fth- At separa- 
tions larger than r > 50 /i~^ Mpc, D2 dominates and we 
see the asymptotic behaviour S2{r\Fth) — >■ 5*1(0 |F(/i) as 
the pixel separations at large scales is effectively random. 
As S2{r = 0\Fth) and its square essentially measure the 
integral of the flux PDF (Equation[6]), we expect any ad- 
ditional information from S'2 to come at small correlation 
lengths r < 50, i.e. from C2. 

Figure [5] displays S2 as a density plot in the two di- 
mensions of r and Fth ■ 

5.2. Distinguishing Between IGM Models 

We compute S2 on realizations of DR7 mock sam- 
ples computed for the various IGM models introduced 
in Section [3l using the systematics and mock sample 
previously discussed (1500 lines-of-sight with S/N=4, 
flux errors ap = 9%). The mock data are evaluated 
on 2-dimensional grids in the ranges 0/i~^Mpc > r > 
80 /i"^ Mpc and 0.5 > Fth > 1-0, with 26 bins in each 
dimension. 

As was done for the PDFs in Section 14. 2( we use the 
logarithmic likelihood — In £ (Equation ^ to quantify 
the ability to differentiate the different IGM models. 
From the initial 676 data points for S2{r,Fth) from each 
model, we first remove ~ 20 points with S2 < 10"'^ to 
reduce the dynamic range, and then remove a few more 
points in order to ensure that the covariance matrix (cal- 
culated from ~ 100 mock realizations for each model) is 
well-conditioned . 

The values of — In £ between the various models are 
summarized in Table [51 In general, we see that the 
threshold probability function does a somewhat better 
job of distinguishing between the homogeneous models: 
- ln/:(G1.3|G1.5) = 54.1 between models G1.5 and G1.3 
compared with — ln£(G1.3|G1.5) = 21.8 when using the 
PDF computed for the same mock samples (Table [l]) . 

The absolute differences AS2 between the two models 
as a function of r and Fth are shown in Figure [7^, nor- 
malized by the estimated error in each point. Figure [8^ 
plots the difference between these two models as a func- 
tion of r at several fixed values of Fth ■ It can be seen that 
the plots of AS2{r\Fth) have the same general shape as 
S2{i"\Fth) (FigureEb) at the various values oi Fth shown, 
which shows that AS2 between these two models includes 
contributions from all comoving scales. 

However, S2 has some ability to break the 
degeneracy between the models G1.3 and R50 
with -ln/:(i?50|G1.3) = 14.1, compared with 
-ln/:(i?50|G1.3) = 2.3 using the PDF. Figure Wfl 
shows \AS2{r,Fth)\ for models R50 and G1.3, and 
we see that the differences are relatively subtle with 
\AS2{r,Fth)\ < la. In FigureH), the plots of AS2(r|Ft,.) 
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(a) S2 and C2 at fixed r (b) S2 and C2 at fixed F,^, 
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Fig. 5. — (a) 52 (solid black lines) and C'2 (dashed red lines) from model G1.5 plotted as a function of flux threshold Fth at fixed comoving 
distances (from top to bottom) r = 0?t~^ Mpc, 15/i~^ Mpc, 60/i~^ Mpc. Note that S2 and C2 overlap in the case of r = 0/i~^Mpc 
(uppermost plot), (b) 52 and C2 plotted as a function of comoving distance r at fixed flux threshold Fth = 0.6,0.75,0.85,0.9 from top to 
bottom. See electronic edition of the Journal for a color version of this figure. 

have a smaUer characteristic size then they are harder 
to distinguish from the closest homogeneous model: 
— ln£(i?25|G1.3) == 7.2 which would be a marginal 
detection. 

As we have discussed, the temperature inhomogeneities 
in the IGM are too subtle to overcome the density field 
which dominates the optical depth distribution of the 
IGM, but regions of higher temperature such as those 
shown in Figure [3] can broaden the width of pixel clus- 
ters that rise above the flux threshold Fth when averaged 
across many spectra, increasing iS'2 at the scales repre- 
sented by the various path lengths at which the Lya lines- 
of-sight intersect the hot bubbles. The shape of A 6*2 in 
Figure [5}d represents a slight broadening of C2 from the 
hot bubbles in the model. While a homogeneous IGM 
with 7 « 1.3 can provide a flux PDF that is indistinguish- 
able from an inhomogeneous model, the threshold cor- 
relation functions measures sufficient spatial information 
to detect the temperature inhomogeneities and break the 
degeneracy with the best-fit homogeneous model. 

In addition, we see from Table [2] that the model R25 
with smaller hot bubbles is difficult to distinguish from 
model R50 with - ln£(i?25|i?50) = 4.5, but the fact 
that — In C is not of order unity indicates that iS'2 en- 
codes some information on the characteristic scale of the 
temperature inhomogeneities. Looking at the inverted 
bubble model 150, we see that it is nearly degenerate 
with R50, with —lnC = 3.2 between the two models. 
Nevertheless, it would appear that all the inhomogeneous 
models can be differentiated from G1.3 with — ln£ > 10 
in comparison with — ln£ ~ 1 when using the PDF. 
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Fig. 6. — 2-dimensional density plot of 52 as a function of F and 
r, evaluated for one mock sample based on the model G1.5. 



TABLE 2 
— ln£ BETWEEN S2{r,Fth) FOR DIFFERENT IGM 
MODELS. ASSUMING 9% FLUX CONTINUUM ERROR 
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Note. — Assumes mock SDSS DR7 data set of 
1500 quasars at 2 = 2.5, with S/N = 4 in the Lyo 
forest. 



shows that the deviations do not trace the shape of 
S^iA^th), but have a hump-like shape peaking at 
scales of r ~ 20/i~^Mpc and extending well into the 
larger scales where S2{r\Fth) goes flat in Figure [S]d . A 
comparison with Figure ^p suggests that most of this 
contribution comes from C2. However, if the bubbles 



5.3. Improved Flux Continuum Estim,ates 

In our mock spectra we have so far assumed what we 
regard as a worst-case scenario for estimating the flux 
continuum of the Lya forest, with errors o f arou nd 9%. 
This was the uncertainty that ISuzuki et al.l ()2005l ) found 
from PCA fitting on the intrinsic quasar spectrum red- 
wards of the Lya emission line and without using any 
information from the Lya forest itself. 

There are various possibilities to improve on the flux 
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Fig. 7. — (a) Absolute diflference between 52 from models G1.3 and G1.5 as a function of r and F^h, normalized by the error from the 
former, (b) Absolute difference between ^2 from models R50 and G1.3. Note different intensity scale from plot on the left. See electronic 
edition of the Journal for a color version of this figure. 
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TABLE 3 

— ln£ BETWEEN S2{r,Ftfi) FOR DIFFERENT fCM 
MODELS, ASSUMING 3% FLUX CONTINUUM ERROR 



Fig. 8. — (a) Difference between 52 from models G1.3 and G1.5, plotted as a function of correlation length r at fixed values of Ffi^ = 0.65 
(black diamonds), Fth = 0.75 (green triangles), and Fth = 0.85 (blue squares). The points are offset horizontally for clarity, (b) Difference 
between 52 from models R50 and G1.3, as a function of r at fixed values of Fn^ = 0.78 (black diamonds) and Ft/^ = 0.82 (green triangles). 
The different shape of (b) compared to (a) indicates the biasing of 52 towards larger r due to the presence of temperature inhomogeneities 
in R50. See electronic edition of the Journal for a color version of this figure. 

continuum fits. One method is to assume that the mean 
flux F for each Lya forest is equal to the global mean 
flux (F{z)) at the corresponding redshift, and normal- 
ize the observed quasar continuum to this value (N. 
Suzuki, private communication). In this way, the er- 
rors in the continuum fit would be limited to a combi- 
nation of cosmic variance and errors in the determina- 
tion of the mean flux. The dispersion in the mean flux 
between different lines-of-sight is of order 1-2%, while 
within an individual line-of-sight with e.g. S/N '^ 4 
across '^ 500 pixels the mean flux can be determined 
to about {S/N X ^{500))-^ ~ 1%. In principle, this 
yields an error on the continuum determination at the 
few percent level (N. Suzuki, private communication). 

In this subsection, we compute the threshold probabil- 
ity functions for mock spectra with assumed flux contin- 
uum errors of ap = 3%, which is an optimistic estimate 
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Note. — Assumes mock SDSS DR7 data set of 1500 
quasars at z = 2.5, with S/N = 4 in the Lya forest. 

of the precision believed possible with the mean flux flt- 
ting method. In all other respects, the properties of our 
mock sample are unchanged from the previous sections. 
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Fig. 9. — (a) Absolute difference between 52 from models R50 and G1.3 as a function of r and Fth,, normalized by the error from the 
former. These were calculated assuming 3% errors in the continuum determination, (b) Absolute difference between 52 from models R25 
and G1.3, assuming 3% continuum errors. Note that the differences in (b) peak at smaller scales. 



Table |3] summarizes the logarithmic likelihoods for dif- 
ferentiating S2 calculated from any two IGM models. 
Overall, the values of — In C see a marked improvement 
when compared with the values in Table [2] computed as- 
suming 9% flux continuum errors. It is now possible 
to tell the inhomogeneous models apart from the model 
G1.3 with significant confidence, with — ln£ > 30. 

More importantly, S2 can now place significant con- 
straints on the characteristic scale of the He II reioniza- 
tion bubbles, as - ln/:(i?25|i?50) = 15.1. The differences 
can be clearly seen in the two-dimensional differences 
plots of S2{r,Fth) of these models with respect to that 
calculated from the best-fit homogeneous model G1.3, 
Figure [HI The deviations from the model G1.3 peak at 
smaller scales r in the case of R25, Figure [Hbj compare 
R50, Figure [9^. This effect can be seen more emphat- 
ically in Figure [TOl which shows AS'2 between R25 and 
R50 plotted at several flux threshold values. At the flux 
threshold values shown, R50 displays a distinct increase 
in S2 at larger scales compared with R25. 

Even with the improved precision for the flux contin- 
uum, 5*2 is still unable to distinguish between R50 and its 
topologically inverted counterpart 150, with — ln>C = 2.7 
for differentiating the two models. This is not surprising, 
as Figure [5] shows that the size distribution of hot and 
cold regions in the R50 and 150 boxes are similar due to 
the equal volume fractions. 

6. DISCUSSION & CONCLUSION 

In this paper we have introduced to astrophysics a 
set of new correlation statistics, S2, C'2, and D2, that 
are evaluated as two-dimensional functions of correla- 
tion length r and transmitted flux threshold Fth- These 
'threshold probability functions' were tested on mock 
Lya forest spectra in which instrumental noise and sys- 
tematics were added at a level appropriate to the SDSS 
Lya forest data, and we have made assumptions on the 
fl ux continuum errors based on the PGA fltting method 
of lSuzuki et"aI1 ((200^) . 

6.1. Flux PDF 



S2(R50) - S2(R25) at fixed F„ 
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Fig. 10. — Difference between 52 from models R50 and R,25, plot- 
ted as a function of correlation length r at fixed values of Fti^ = 0.75 
(black diamonds) and Ftj^ = 0.85 (green triangles). The points are 
offset horizontally for clarity. See electronic edition of the Journal 
for a color version of this figure. 

As the threshold correlation statistics can be thought 
of as the flux probability distribution function (PDF) 
measured as a function of spatial scale, we first computed 
the flux PDF in order to check that the errors from our 
mock spectra are comparable with those of the observed 
PDF from SDSS spectra published in Desiacques et al] 
(|2007h . At the level of uncertainty arising from the latest 
SDSS data set, we find that with the assumption of a 
homogeneous IGM the flux PDF can in fact constrain 
the IGM equation of state, 7, to A7 ~ 0.1 at the ^ 4a 
level in redshift bins of Az ^ 0.3. This would allow 
the redshift evolution of 7 to be constrained through the 
epoch of He II reionization at z « 3. 

To-date, the best constraints on the equation of state 
7 of the IGM have come from flux PDFs measured from 
small numbers (~ 20) of high-resolution Lya forest spec- 
tra, which have favored an in verted equati o n of state 
7 < (,Becker et al, ,2007k ,Kim et al.l [20071: iViel et al.l 
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l2009f) . In a subsequent paper we aim to measure the 
flux PDF from SDSS DR7, and in conjunction with nu- 
merical simulations and detailed modelling of systemat- 
ics make independent measurements of 7 as a function of 
redshift. This would be an interesting complement to the 
studies using the mean flux evolution in the Lya forest, 
which have placed the completi on of He II reionization 
at z w 3.2 (jBernardiet al.ll2003D . 

Baryon Oscillation Spectroscopic Survey (BOSS), part 
of the third phase of the SDSS (SDSS-III), aims to ob- 
serve ~ 150,000 quasar lines-of-sight. While many of 
these Lya forest spectra will be of low signal-to-noise 
(S/N ~ 1), there will be sufhcient numbers of moderate 
S/N spectra that even better constraints can be made on 
the He II end-of-reionization epoch than the DR7 sample 
considered in this paper. 

6.2. Threshold Probability Functions 

We have tested the threshold probability functions on 
the mock spectra, and found that they are able to break 
the degeneracy between an IGM with temperature in- 
homogeneities and the best-fit homogeneous model of 
7 = 1.3, at a confidence of ~ 5a. 

If the errors in the continuum fitting can be reduced 
to ap w 3%, then S2 is sensitive to the characteris- 
tic scale of the temperature inhomogeneities, being able 
to distinguish between toy models with 50 h^^ Mpc and 
25 h^^ Mpc bubbles at -- 5a. 

In this paper we have taken a simplified approach 
towards the instrumental systematics and noise in the 
mock spectra. This has provided adequate estimates 
for our errors, but would be insufficient to constrain the 
IGM from real data. Some of the systematics we have 
not taken into account include the possible influence of 
damped Lya (DLA) regions in the Lya forest, metal line 
absorption, etc. All these need to be modelled in detail 
when analyzing real data. As for the flux continuum fit- 
ting, we hope to use methods such as mean-fiux fitting 
to reduce the errors to several percent, which would en- 
able the threshold probability functions to constrain the 
physical scale of any thermal inhomogeneities in addition 
to making a detection. Whatever the method used, the 
residual errors that arise need to be well-characterized in 
order to be modelled in the data analysis. 

The mock spectra used in this paper have been gen- 
erated using dark matter-only simulations that do not 
capture detailed IGM physics, and various toy models 
for the IGM have been 'painted' on to the basic set of 
mock spectra. In the actual data analysis we anticipate 
fitting the data to mock spectra based on more detailed 
numerical simulations that include the physics of He II 
reionization. This would allow, in addition to a direct 



measurement of the temperature range AT and spatial 
scale of the inhomogeneities, constraints to be placed on 
the underlying physical mechanisms such as the quasar 
luminosity function and duty cycle, background ioniza- 
tion rate, gas clumping factors etc. 

In the near future, the high area density of the BOSS 
quasars will enable correlation studies in the transverse 
direction between quasar pairs. It would be straight- 
forward to extend the threshold probability functions to 
work in both the parallel and transverse directions rela- 
tive to line-of-sight in order to utilize the full power of the 
BOSS data. In addition, transverse studies would ame- 
liorate the effects of the uncertain fitting of the quasar 
continuum. However, we defer the three-dimensional 
generalization of the threshold probability functions to 
a future paper. 

6.3. Summary 

Using mock Lya forest spectra based on toy models 
of the IGM and simulations of the existing SDSS data, 
we have shown that detailed statistical analysis of these 
spectra can provide insight into the physics of the IGM: 

• The flux PDF from the SDSS DR7 can place 
significant constraints on the equation of state 7 
of a homogeneous IGM to A7 ^ 0.1 at z ss 2.5, 
and track its evolution through the end of He II 
reionization. 



• We have introduced the threshold probability 
functions ^2, C2, and D2, which measure the Lya 
forest as functions of fiux level and spatial scale, 
and have shown that they can differentiate an in- 
homogeneous IGM from the best-fit homogeneous 
model at > 3a. 



• If the flux continuum fitting can be carried out to 
« 3% accuracy, the threshold statistics can place 
constraints on the characteristic scale of the tem- 
perature inhomogeneities. 



The authors thank Sal Torquato, Matt McQuinn, 
Michael Strauss and Nao Suzuki for useful discussions 
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per. 
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